Use sorted slices to represent HDR histogram #52

lahsivjar · 2023-07-28T22:45:13Z

An alternative to #49. Uses sorted slices to store HDRHistograms. The sorted nature is used to efficiently perform merges.

Benchmark (thanks to #50):

name                           old time/op    new time/op    delta
NDJSON/AgentNodeJS-10            33.9ms ± 1%    15.3ms ± 3%  -54.73%  (p=0.000 n=10+10)
NDJSON/AgentPython-10            58.7ms ± 1%    44.1ms ± 2%  -24.76%  (p=0.000 n=9+10)
NDJSON/AgentRuby-10              49.7ms ± 1%    23.7ms ± 2%  -52.36%  (p=0.000 n=9+10)
NDJSON/AgentGo-10                76.2ms ± 1%    26.6ms ± 1%  -65.13%  (p=0.000 n=10+10)
NDJSONParallel/AgentNodeJS-10    33.9ms ± 1%    15.5ms ± 2%  -54.33%  (p=0.000 n=10+10)
NDJSONParallel/AgentPython-10    59.5ms ± 3%    44.1ms ± 2%  -25.82%  (p=0.000 n=10+10)
NDJSONParallel/AgentRuby-10      50.2ms ± 2%    23.4ms ± 3%  -53.42%  (p=0.000 n=10+10)
NDJSONParallel/AgentGo-10        77.0ms ± 2%    26.8ms ± 1%  -65.25%  (p=0.000 n=10+10)

name                           old alloc/op   new alloc/op   delta
NDJSON/AgentNodeJS-10            7.74MB ± 0%    7.58MB ± 0%   -2.04%  (p=0.000 n=10+9)
NDJSON/AgentPython-10            21.4MB ± 1%    20.9MB ± 1%   -2.41%  (p=0.000 n=10+10)
NDJSON/AgentRuby-10              12.0MB ± 0%    11.7MB ± 0%   -1.95%  (p=0.000 n=10+10)
NDJSON/AgentGo-10                15.2MB ± 1%    14.0MB ± 0%   -8.18%  (p=0.000 n=10+10)
NDJSONParallel/AgentNodeJS-10    7.73MB ± 0%    7.59MB ± 0%   -1.76%  (p=0.000 n=10+10)
NDJSONParallel/AgentPython-10    21.4MB ± 1%    20.9MB ± 0%   -2.54%  (p=0.000 n=10+10)
NDJSONParallel/AgentRuby-10      12.0MB ± 0%    11.7MB ± 0%   -1.96%  (p=0.000 n=9+10)
NDJSONParallel/AgentGo-10        15.2MB ± 1%    14.0MB ± 0%   -8.21%  (p=0.000 n=9+9)

name                           old allocs/op  new allocs/op  delta
NDJSON/AgentNodeJS-10             83.4k ± 0%     83.1k ± 0%   -0.36%  (p=0.000 n=10+10)
NDJSON/AgentPython-10              221k ± 1%      223k ± 1%   +0.87%  (p=0.009 n=10+10)
NDJSON/AgentRuby-10                134k ± 0%      134k ± 0%   +0.30%  (p=0.000 n=10+9)
NDJSON/AgentGo-10                  165k ± 0%      162k ± 0%   -1.86%  (p=0.000 n=10+8)
NDJSONParallel/AgentNodeJS-10     83.4k ± 0%     83.1k ± 0%   -0.36%  (p=0.000 n=10+10)
NDJSONParallel/AgentPython-10      222k ± 1%      222k ± 1%     ~     (p=0.165 n=10+10)
NDJSONParallel/AgentRuby-10        134k ± 0%      134k ± 0%   +0.29%  (p=0.000 n=10+7)

More benchmarks:

name                         old time/op    new time/op    delta
AggregateCombinedMetrics-10    1.64µs ± 2%    1.59µs ± 3%   -2.98%  (p=0.000 n=10+9)
AggregateBatchSerial-10        5.44µs ± 5%    4.76µs ± 2%  -12.48%  (p=0.000 n=10+10)
AggregateBatchParallel-10      5.49µs ± 2%    4.83µs ± 3%  -12.04%  (p=0.000 n=8+10)
CombinedMetricsEncoding-10      982ns ± 2%     960ns ± 0%   -2.27%  (p=0.000 n=9+7)
CombinedMetricsToBatch-10       201µs ± 1%      89µs ± 0%  -55.69%  (p=0.000 n=8+10)
EventToCombinedMetrics-10       312ns ± 1%     310ns ± 0%   -0.73%  (p=0.000 n=10+10)

name                         old alloc/op   new alloc/op   delta
AggregateCombinedMetrics-10    1.49kB ± 0%    1.45kB ± 1%   -2.93%  (p=0.000 n=9+10)
AggregateBatchSerial-10        2.66kB ± 4%    2.54kB ± 4%   -4.56%  (p=0.001 n=10+10)
AggregateBatchParallel-10      2.63kB ± 2%    2.55kB ± 4%   -2.89%  (p=0.003 n=9+10)
CombinedMetricsEncoding-10       654B ± 1%      640B ± 1%   -2.09%  (p=0.000 n=9+9)
CombinedMetricsToBatch-10      4.09kB ± 0%    4.08kB ± 0%   -0.27%  (p=0.000 n=9+10)
EventToCombinedMetrics-10       0.00B          0.00B          ~     (all equal)

name                         old allocs/op  new allocs/op  delta
AggregateCombinedMetrics-10      23.0 ± 0%      21.0 ± 0%   -8.70%  (p=0.000 n=10+10)
AggregateBatchSerial-10          31.6 ± 5%      26.1 ± 4%  -17.29%  (p=0.000 n=9+10)
AggregateBatchParallel-10        31.4 ± 2%      26.4 ± 5%  -15.92%  (p=0.000 n=10+10)
CombinedMetricsEncoding-10       0.00           0.00          ~     (all equal)
CombinedMetricsToBatch-10        95.0 ± 0%      95.0 ± 0%     ~     (all equal)
EventToCombinedMetrics-10        0.00           0.00          ~     (all equal)

aggregators/codec.go

proto/aggregation.proto

axw · 2023-07-31T01:22:15Z

proto/aggregation.proto

@@ -152,3 +139,16 @@ message Overflow {
  bytes overflow_service_transactions_estimator = 5;
  bytes overflow_spans_estimator = 6;
 }
+
+message Bar {


HDRHistogramBar?

IIANM adding this message will increase the amount of pointer indirection, and memory pressure, compared to #49.

OTOH I like the approach of keeping the histogram sorted at all times. Did you already consider and investigate doing this with the existing pair of arrays representation?

Did you already consider and investigate doing this with the existing pair of arrays representation?

Good point, I didn't do it. I will use counts and buckets instead of Bar for the proto representation.

OTOH I like the approach of keeping the histogram sorted at all times

Another advantage of this is that it makes the Buckets() calculation more efficient (which is why we see the improvement in CombinedMetricsToBatch w.r.t. time/op.

Updated the code to remove Bar message from proto definition.

axw

Looks good overall, just wondering if we can further simplify the HistogramRepresentation code.

aggregators/internal/hdrhistogram/hdrhistogram.go

axw · 2023-07-31T05:25:12Z

aggregators/internal/hdrhistogram/hdrhistogram.go

-	for idx := 0; iter.next(); idx++ {
-		if bucketsSeen == h.CountsRep.Len() {
-			break
+	iter.nextCountAtIdx()


I'm wondering if the iterator type is still needed, and whether we could simplify things. Would it be possible to store the bucket value, rather than index, in CountsRep? Then Buckets is simply iterating through and copying to slices.

Would it be possible to store the bucket value, rather than index, in CountsRep?

Do you mean save the highestEquivalentValue for each bucket instead of index? That value is currently calculated based on the bucket and sub bucket index and I think it would require some re-implementation of the core logic.

~~I might be able to use the bucketIdx to derive the highestEquivalentValue directly without requiring the subBucket (which is the whole reason I kept the iterator). Giving this a shot now.~~

This doesn't work because we require the subBucketIdx even in this case. I think we might be able to do some refactoring to achieve a simpler state but maybe in a future PR?

👍 let's take it to a followup

aggregators/merger.go

Co-authored-by: Andrew Wilkins <axwalk@gmail.com>

carsonip · 2023-07-31T12:27:15Z

aggregators/merger.go

+	to.Buckets = slices.Grow(to.Buckets, requiredLen)
+	to.Counts = slices.Grow(to.Counts, requiredLen)


I believe there's a slight issue with slices.Grow here, since slices.Grow(s, n) ensures there's extra space for n more elements. The right call here should be to.Buckets = slices.Grow(to.Buckets, requiredLen-len(to.Buckets)).

lahsivjar added 4 commits July 29, 2023 06:17

Use sorted slice for representing HDR histogram

53351f2

(minor): fix comments

e470d54

Optimize hdr histogram to use sorted slice

2614012

remove unused method

bc8cb2f

lahsivjar marked this pull request as ready for review July 29, 2023 03:25

lahsivjar requested a review from a team as a code owner July 29, 2023 03:25

axw reviewed Jul 31, 2023

View reviewed changes

Remove Bar from histogram proto

2eae439

lahsivjar requested a review from axw July 31, 2023 03:39

lahsivjar added 3 commits July 31, 2023 11:44

Merge branch 'main' into sorted-slice-hdr

11531ab

Fix lint

0a9b4b3

from not changed when merging histogram anymore

4c48335

axw reviewed Jul 31, 2023

View reviewed changes

lahsivjar and others added 3 commits July 31, 2023 13:41

Update aggregators/merger.go

cc712f7

Co-authored-by: Andrew Wilkins <axwalk@gmail.com>

Bar -> bar

4c9ae85

jump -> advance

084daaa

axw approved these changes Jul 31, 2023

View reviewed changes

lahsivjar merged commit 4005592 into elastic:main Jul 31, 2023
2 checks passed

lahsivjar deleted the sorted-slice-hdr branch July 31, 2023 07:24

carsonip mentioned this pull request Jul 31, 2023

perf: Merge histograms as sorted slices instead of using temp map #49

Closed

carsonip reviewed Jul 31, 2023

View reviewed changes

carsonip mentioned this pull request Aug 2, 2023

perf: Add mergeHistogram fast path #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sorted slices to represent HDR histogram #52

Use sorted slices to represent HDR histogram #52

lahsivjar commented Jul 28, 2023 •

edited

Loading

axw Jul 31, 2023

lahsivjar Jul 31, 2023

lahsivjar Jul 31, 2023

axw left a comment

axw Jul 31, 2023

lahsivjar Jul 31, 2023

lahsivjar Jul 31, 2023 •

edited

Loading

axw Jul 31, 2023

carsonip Jul 31, 2023

carsonip Jul 31, 2023

		to.Buckets = slices.Grow(to.Buckets, requiredLen)
		to.Counts = slices.Grow(to.Counts, requiredLen)

Use sorted slices to represent HDR histogram #52

Use sorted slices to represent HDR histogram #52

Conversation

lahsivjar commented Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lahsivjar Jul 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lahsivjar commented Jul 28, 2023 •

edited

Loading

lahsivjar Jul 31, 2023 •

edited

Loading